Add ability to save optimizer and resume while training #275

coolkp · 2025-10-22T21:58:09Z

This pr adds

Add optimizer state to checkpoint (save checkpoint path, whole state saved)
Load optimizer and step, overwrite optimizer in training loop (be careful of shardings) its inherited.
resume training from last checkpointed step

github-actions · 2025-10-22T21:58:22Z

coolkp force-pushed the optimizer-resume branch from 2c2317a to 7ec9c93 Compare October 22, 2025 22:00

Add ability to save optimizer and resume while training

feebbb1

coolkp force-pushed the optimizer-resume branch from 7ec9c93 to feebbb1 Compare October 22, 2025 22:04

entrpn approved these changes Oct 22, 2025

View reviewed changes

coolkp merged commit 662d501 into main Oct 22, 2025
2 of 3 checks passed

Provide feedback